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Perception is an active process that interprets and structures the stimulus input based on 
assumptions about its possible causes. We use real-time functional magnetic resonance 
imaging (rtfMRI) to investigate a particularly powerful demonstration of dynamic object 
integration in which the same physical stimulus intermittently elicits categorically different 
conscious object percepts. In this study, we simulated an outline object that is moving 
behind a narrow slit. With such displays, the physically identical stimulus can elicit 
categorically different percepts that either correspond closely to the physical stimulus 
(vertically moving line segments) or represent a hypothesis about the underlying cause 
of the physical stimulus (a horizontally moving object that is partly occluded). In the latter 
case, the brain must construct an object from the input sequence. Combining rtfMRI 
with machine learning techniques we show that it is possible to determine online the 
momentary state of a subject's conscious percept from time resolved BOLD-activity. 
In addition, we found that feedback about the currently decoded percept increased 
the decoding rates compared to prior fMRI recordings of the same stimulus without 
feedback presentation. The analysis of the trained classifier revealed a brain network that 
discriminates contents of conscious perception with antagonistic interactions between 
early sensory areas that represent physical stimulus properties and higher-tier brain areas. 
During integrated object percepts, brain activity decreases in early sensory areas and 
increases in higher-tier areas. We conclude that it is possible to use BOLD responses 
to reliably track the contents of conscious visual perception with a relatively high temporal 
resolution. We suggest that our approach can also be used to investigate the neural basis 
of auditory object formation and discuss the results in the context of predictive coding 
theory. 
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INTRODUCTION 

Human cognitive neuroscience makes the strong assump- 
tion that all subjective human experience is tightly linked 
to spatiotemporal patterns of a neuronal activation. For sen- 
sory perception this assumption predicts that patterns of brain 
activity should not only reflect the physical stimulus but also 
the content of perceptual awareness derived from the physi- 
cal input (von Helmholtz, 1867). Perceptually ambiguous stim- 
uli, in which the same physical stimulus elicits categorically 
different and mutually exclusive percepts, provide an excellent 
opportunity to investigate constructive brain processes under- 
lying the formation and the maintenance of subjective percep- 
tual experiences. The neural correlates of different ambiguous 



visual stimuli have been studied using fMRI, including binocular 
rivalry (Tong et al., 1998; Haynes and Rees, 2005), ambigu- 
ous static figures (Kleinschmidt et al., 1998), structure-from- 
motion (Brouwer and van Ee, 2007; Freeman et al., 2012), 
occluded moving drawings (Murray et al, 2002; Fang et al., 
2008), and moving plaids (Castelo-Branco et al., 2002). It has 
been suggested that a strong link between dynamically chang- 
ing brain activation patterns and subjective perceptual states 
can be established by predicting different states of perception 
from brain activation (Cox and Savoy, 2003; Rieger et al., 2008; 
Chang et al., 2010). However, none of the previous fMRI- 
studies on perceptual organization of ambiguous stimuli made 
a serious attempt to quantify the relevance of the observed 
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FIGURE 1 | Presented stimulus. (A) The shape of a 3-loop-figure that was 
simulated to move horizontally back and forth behind a narrow aperture. 
(B,C) Show snapshots of the figure that can be seen through the simulated 
aperture at two different points in time. Although only line segments are 
visible at each point in time the subject is able to perceive the figure as a 
whole. At a specific width of the aperture the participant's conscious 
perception switches spontaneously between vertically jumping line 
segments and the horizontally moving figure. From (B) to (C) the figure 
moves to the left. 



BOLD-modulations for the tracking of spontaneously changing 
states of awareness. 

In vision, occlusion of a moving object, a situation common in 
everyday vision, requires integration of sequentially visible object 
fragments into the percept of a coherent object. The stimulus we 
employ in this study, mimics this occlusion problem (Figure 1). 
It simulates an outline figure moving horizontally back and forth 
behind an occluder with a narrow vertical aperture, a situa- 
tion similar to looking through a door that is only open a slit. 
Importantly, with our stimulus an observer's conscious percept 
intermittently switches between the actually presented sequence 
of object parts, short line segments moving vertically, and a 
horizontally moving integrated object that is located behind an 
occluder. In previous studies (Fendrich et al., 2005; Rieger et al., 
2007) we have shown that under free viewing conditions the inte- 
grated object percept is constructed post-retinally in the brain and 
that eye movements do not contribute to the construction of the 
integrated object. Murray et al. (2002) investigated the effects of 
dynamic bistable figure integration on the BOLD-activation level 
in human VI. The subjects observed a diamond shaped outline 
figure moving behind apertures. The authors reported a decrease 
of BOLD-activation in VI while subjects perceived the integrated 
figure and conclude that this reduction is in concordance with 
predictive coding theory. Predictive coding theory states that rep- 
resentation of simple stimulus features in early sensory cortices 
is explained away by feedback from higher areas involved in the 
creation of derived percepts, such as occluded objects. Fang et al. 
(2008) and de-Wit et al. (2012) reported activation changes in 
the lateral occipital complex (LOC) with opposite sign than those 
in VI. Unfortunately, all these authors reported only BOLD- 
modulations in a small pre-selected set of visual brain areas, and 
therefore do not address the extent and functional characteristics 
of the brain networks that carry information about the content of 
visual perception and are thus likely to be involved in establishing 
the current percept of the ambiguous stimulus. 

In the auditory domain, object formation from sequentially 
presented tones requires temporal integration, similar to the 
occluded-object stimulus we are using in the current study. Whole 
head fMRI suggests roles for intraparietal sulcus in modality 
independent structuring of sensory input Cusack (2005) and 
feedforward-feedback interactions between cortical and thalamic 
regions during percept switching (Kondo and Kashino, 2009). 
In addition, neuromagnetic measurements indicate that non- 
primary auditory areas maintain the representation of auditory 
streams (Gutschalk et al., 2005). Furthermore, left auditory cortex 
was found to play a dominant role in active auditory stream seg- 
regation (Deike et al, 2010). Interestingly, Pressnitzer and Hupe 
(2006) suggest that although in vision and audition object for- 
mation and perceptual bistablity may have different underling 
neuronal correlates, both modalities may share similar functional 
mechanisms with similar dynamics. Thus, a method suitable to 
establish a tight link between BOLD-activity and the state of 
awareness in bistable visual object formation will likely be useful 
to investigate bistable auditory object formation. 

Our first goal in this study is to determine if the brain activa- 
tion changes linked to the two conscious percepts of our ambigu- 
ous display are robust enough to allow a reliable determination of 



the current content of conscious visual perception. It should be 
noted that activation changes that occur exclusively in the visual 
system will not necessarily be the most informative indicators 
of the actual conscious visual percept. Therefore, we use whole- 
brain fMRI scans and multivariate machine learning techniques 
to determine brain activation patterns that discriminate between 
subjective perceptual states (Norman et al, 2006). As an ultimate 
proof of concept our main goal was to perform the discrimination 
online. In our asynchronous online prediction scheme no prior 
knowledge about the time of percept switches is available which 
makes tracking of percepts more challenging than prediction of 
synchronous perceptual events (e.g., Hollmann et al., 2011). To 
date, real-time fMRI has been employed in several tasks to predict 
brain states (LaConte et al., 2007; Hollmann et al., 2011) and to 
provide online feedback (Weiskopf et al., 2007; Weiskopf, 2012). 
However, we are unaware of a study that shows the feasibility of 
rtfMRI discrimination of the contents of visual perception with 
ambiguous stimuli. Furthermore, we propose a method to test the 
generalizability of the multivariately assessed brain patterns over 
subjects. Our third goal was to characterize the brain networks 
that allow for discrimination between the perception of integrated 
objects vs. object parts and to establish whether the relative activa- 
tion levels in this network are concordant with predictive coding 
theory. The methods we apply may also be applicable for discrim- 
inating bistable auditory percepts, provided that effect sizes and 
temporal dynamics are similar. 

MATERIALS AND METHODS 
STIMULI AND STIMULUS PRESENTATION 

The stimulus was the simulation of an outline loop-figure moving 
horizontally behind a narrow vertical aperture with invisible bor- 
ders (Figure 1). It was back projected onto a translucent screen 
with a JVC DLA-G150CL projector and viewed by the subject via 
a mirror. The eye-screen distance was 59 cm. 

The width of the aperture was individually adjusted such that 
observers had intermittent percepts that spontaneously switched 
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between the physical stimulus, short line segments with chang- 
ing orientation moving vertically up and down, or an integrated 
but occluded object moving horizontally back and forth behind 
an aperture (Fendrich et al., 2005). The height and width of the 
outline figure was 6.8 and 8.7° visual angle, respectively. The 
individual aperture widths ranged between 3.7 and 7.0% of the 
outline figure width. We will denote the percept of the physical 
stimulus as the lines percept and the percept of the integrated 
figure as the object percept. 

Subjects freely viewed the display and reported their percept by 
holding one of two buttons pressed throughout the whole interval 
the percept lasted. We found in earlier investigations that under 
free viewing conditions eye movements tracking the horizontal 
object movements were negligibly small and did not influence 
the quantity or quality of object percepts (Fendrich et al, 2005; 
Rieger et al., 2007). The assignment of the button presses to the 
line and object percepts was switched after each run to avoid a 
correlation of any purely motor BOLD-activations produced by 
the button presses with the activations produced by changes in 
the conscious percepts. In real-time mode, we presented visual 
feedback by switching line color slightly and auditory feedback by 
presenting the decoded percept as spoken word from an audio file 
in the moment of switch detection, respectively. 

SUBJECTS AND EXPERIMENTS 

Data were acquired in two experiments which we will refer to as 
offline experiment and online experiment. In the offline experi- 
ment, we used conventional fMRI to record the BOLD-activations 
while subjects viewed the stimulus and reported their subjective 
percepts via button presses. Fifteen subjects participated in this 
experiment (8 female, 7 male, mean age = 26.3 years). The Data 
of this experiment we analyzed only in an offline mode as it is con- 
ventionally performed in fMRI. In order to prove the reliability 
of our approach, we conducted a second experiment. The online 
experiment was designed to track the current content of con- 
scious perception using real-time fMRI BOLD-measurements, 
and we performed both an online and offline data analysis. Ten 
subjects participated in this experiment (6 female, 4 male, mean 
age = 25.7 years). 

The experiments were approved by the ethics committee of the 
Medical Faculty of the Otto-von-Guericke University. All subjects 
gave their informed consent prior to the start of the experiment. 
They had normal or corrected to normal vision, and were paid for 
participation. 

MR-SCANNING 

A Siemens-Trio scanner equipped with an 8-channel phased array 
head-coil was used for anatomical and functional MRI. Full head 
Tl -weighted anatomical images were obtained with an MPRAGE 
sequence (80 axial slices, slice thickness = 2 mm, field of view = 
256 by 192 mm; inplane matrix = 256 by 192). Functional images 
were obtained from the whole head with a gradient recalled 
echoplanar imaging (EPI) sequence [32 axial slices, slice thick- 
ness = 4 mm, field of view = 200 mm; inplane matrix = 64 
by 64; TR = 2.5 s (offline experiment) and 2.0 s (online experi- 
ment), echo time = 30 ms (offline experiment) and 27 ms (online 
experiment)]. Each run lasted 420s, and each subject completed 



between 7 and 8 runs in the offline experiment and 8 runs in the 
online experiment. 

SUPPORT VECTOR MACHINE LEARNING 

Support vector machines (SVMs) represent a class of machine 
learning algorithms characterized by good generalization perfor- 
mance even in high dimensional feature spaces (Vapnik, 1998). 
This is achieved because the internal regularization avoids overfit- 
ting so that the complexity of the classifier does not depend on the 
complexity of the feature space (Cherkassky and Mulier, 1998). 
Therefore, SVMs are a suitable approach for analyzing fMRI sig- 
nals, particularly when a whole head analysis is preferred. The 
central idea of the algorithm is to maximize the distances from 
the training data to the separating hyperplane which is character- 
ized by its normal vector w, also referred to as the weight vector, 
and a bias parameter b. With these parameters estimated from the 
training data, labels y, of new data 5; can be predicted by a simple 
inner product with the function: 

ji = sign (w ■ x) + b) (1) 

Note that the inner product between the classifier weights and the 
measured data can be interpreted as a weighted "voting" of each 
voxel for one or the other class. Due to the properties of the inner 
product, votes for positive and negative classes can be generated 
by multiple combinations of positive or negative weights with 
positive or negative voxel values. It is thus difficult to interpret the 
sign of the weight. A comprehensive introduction to support vec- 
tor machine learning is provided in Burges (1998). In our study 
we pre-selected a relatively high number of voxels and assigned 
the signal change of each voxel as input data for a linear SVM. 
Labels for both training and testing of the SVMs were derived 
from the button presses shifted 5 s forward in time to account for 
the delay of the hemodynamic response function (HRF). 

DATA PROCESSING 
Preprocessing 

In the offline discrimination of line vs. object percepts we 
employed several preprocessing steps using the SPM5 pack- 
age (Wellcome Department of Cognitive Neurology, University 
College London, UK). We corrected slice acquisition time and 
head movements, and we spatially normalized the individual 
brains to the MNI standard brain. Finally, we spatially smoothed 
the functional data with an 8 mm FWHM Gaussian kernel. In the 
temporal domain we band-pass filtered the voxel time series. We 
set the high-pass cut-off frequency to 1/128 Hz and determined 
the low-pass cut-off frequency between 1/27 and 1/8 Hz individu- 
ally as described below. We discarded voxels exceeding 10% signal 
change from the analysis because at 3T they are likely to stem from 
large vessel contributions. 

Offline-classification 

In the offline analysis which was applied to both experiments, 
we performed a leave-one-run-out cross-validation rather than 
using a leave-one-volume-out or random-selection of volumes 
approach to test for generalizability. This avoids information 
transfer between training and test set due to temporal correla- 
tions in the slowly varying BOLD response. In this procedure, all 
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runs but one are used for feature selection, classifier training, and 
filter optimization. The decoding accuracy of the trained classifier 
is then tested on the reserved test set. 

We used a univariate statistical approach for feature selection 
to reduce computational costs. Therefore, we calculated model 
BOLD-responses from the reported object and line percepts, 
regressed them on to the voxel BOLD-time courses and calcu- 
lated univariate P-tests to assess statistical difference between 
the BOLD-amplitudes estimated for the two percepts. Finally, 
the 5000 voxels with the lowest p-value were selected for the 
classification step. 

In addition to feature selection, the cutoff frequency of the 
low-pass filter was optimized in the cross-validation loop to 
minimize high frequency noise and to account for individual vari- 
ations in the duration of the percepts. We tested the classification 
performance on the training set with four low-pass cutoff fre- 
quencies 1/27, 1/18, 1/12 and 1/8 Hz and selected the cut-off with 
the smallest classification error in cross validation performed on 
the training set. Although minimal loss on the training set does 
not necessarily imply optimal performance on the test data, we 
found this approach appropriate. 

The trained classifier was then applied to each EPI-volume in 
the left out test run to determine the subject's current conscious 
percept from the BOLD-activation patterns. For evaluation of 
the classifier's discrimination performance we compared the time 
course of the subject's button presses shifted by 5 s to take into 
account the HRF delay. For classification and cross-validation we 
combined the Princeton MVPA toolbox (http://code.google.com/ 
p/princeton-mvpa-toolbox/) with the Spider machine learning 
toolbox (http://www.kyb.tuebingen.mpg.de/bs/people/spider). 

Online-classification 

Online analysis is the ultimate test of the ability to track the 
phenomenal content of ambiguous perception time resolved 
from BOLD-activations. We implemented a real-time fMRI- 
classification experiment based on the experimental description 
language EDL (Hollmann et al., 2008). Data acquisition was 
performed with the same parameters as in the offline experi- 
ment except for shorter TRs. The MR scanners' EPI scanning 
protocol was modified to immediately export the acquired and 
motion corrected volumes (Hollmann et al., 2008). On the clas- 
sification host, transformation matrices for spatial normalization 
were calculated from the first acquired EPI-volume and applied 
to all subsequently acquired volumes. After normalization, the 
EPI- volumes were spatially smoothed (8 mm FWHM Gaussian 
kernel). Stimulus presentation and classification were performed 
on two separate computers that communicated via RS-232 con- 
nection. 

The first two runs of each scanning session were used for 
feature selection and for training of the initial classifier. First, 
the BOLD time series were zero centered and the baseline val- 
ues were kept for subsequent subtraction to approximate a zero 
centered signal online. Then the time series were temporally fil- 
tered applying a digital fourth order butterworth band-pass filter 
with cutoff frequencies 1/128 and 1/16 Hz. Initial univariate fea- 
ture selection was performed similar to the procedure described 
in the offline experiment except that we applied an ANOVA as 



statistical test and retained the voxels with the smallest 10,000 
p-values for a first pass of classifier training. We used the initial 
classifier for a second, multivariate feature selection in which we 
retained only voxels with feature weights exceeding a threshold 
w t h which was defined at w t h = 10 _1 max \wk\ , k = 1 ... 10. 000. 
The resulting feature space was used for further analysis includ- 
ing retraining on the current data set. Online discrimination of 
the content of perception started with the third run. Auditory 
feedback was provided by playing the respective word (female 
voice speaking) when a switch in the state of the percept was 
detected. Furthermore, the according state was continuously indi- 
cated by slightly changing the color of line segments to greenish 
for decoded object percepts and to reddish for decoded line per- 
cepts. Analogous to the offline experiment, subjects reported their 
perceptual state by button presses. Moreover, we instructed the 
participants to mentally assess the correctness of the decoded 
percept, but to consider a delay of approximately 8 s, expectable 
due to the hemodynamic response delay and data processing 
time. Eight runs were acquired for each subject. The classifier was 
updated after every run starting from the third. 

Cross-subject generalization 

In analogy to the classical statistical approach, we aimed to 
use multivariate classification to derive predictive patterns of 
BOLD-activation that would generalize across individual sub- 
jects. Multivariate pattern analysis often focuses on individual 
discriminative patterns within a small circumscribed brain area 
(e.g., Kay et al., 2008). Therefore, this approach is often not 
commensurate to the attempt in classical statistical analysis to 
generalize experimental results to a population. Our approach 
aims to derive patterns of BOLD-activation from a group of sub- 
jects that will generalize to new subjects. The generalization is 
tested by showing that the patterns found can discriminate con- 
scious perceptual contents with better than chance performance 
in data from new subjects. This last test of generalization perfor- 
mance is typically omitted in classical statistics but critical for 
the evaluation of the relevance of brain activation patterns at 
the population level, as it is performed with standard parametric 
univariate testing. 

In our approach, we used a leave-one-subject-out cross- 
validation. Because the total number of EPI-volumes (19,188 
in the offline experiment and 16,480 in the online experiment, 
respectively) is prohibitively large for our whole head analysis 
approach and considering that classification appears to be more 
reliable distant from perceptual switches, we reduced the amount 
of training data by averaging over three EPI-volumes around the 
center of each interval of a sustained percept. This reduced the 
data set to 2526 EPI-volumes in the offline experiment and 1412 
EPI-volumes in the online experiment, respectively. 

For feature selection we trained an SVM on all available 
volumes of a single subject involving 30,000 voxels with lowest p- 
values derived from an F-statistic. Then we calculated a weighted 
combination of each training subject's weight vector 



Frontiers in IMeuroscience | Auditory Cognitive Neuroscience 



May 2014 | Volume 8 | Article 116 | 4 



Reichert et al. 



Online tracking of conscious perception 



In this equation, w k is the individual weighting of the k th feature 
in the normal vector iv, and pi is the rate of correct classifica- 
tions achieved with the i th of n subjects. The result is a map 
of voxels weighted by both the importance of the voxel for the 
individual subjects and the relevance of the whole individual 
voxel map for the discrimination of the perceptual content. To 
perform the group analysis, we generated these maps in cross- 
validations and selected the 30,000 highest weighted voxels from 
the combined map. 

Permutation testing 

We evaluated the reliability of the classifier's discrimination rate 
with respect to the empirical guessing level and its 95% con- 
fidence obtained with a permutation test. Only discrimination 
rates exceeding the 95% confidence interval for guessing were 
considered reliable (Good, 2005). Note that the mean empirical 
guessing levels can substantially deviate from the expected level 
(50% in the two class case), and may have wide confidence inter- 
vals (see e.g., Rieger et al., 2008). For permutation testing, the 
available labels, line, or object percept, were randomly reassigned 
among the measured EPI-volumes. Ideally, this procedure should 
prevent the classifier from learning information useful for sep- 
arating the classes. However, even though labels are randomly 
assigned to the EPI-volumes in most cases the classifier will dis- 
criminate line and object percepts with a rate different from 50% 
correct. The critical question is, whether the discrimination rate 
obtained with the actually measured data set is within the confi- 
dence interval for discrimination rates that can be obtained with 
randomly assigned labels. However, since we discriminate events 
in a continuous time series the permutation scheme should con- 
sider temporal correlations due to the distribution of durations 
of the line and object percepts. We retained these durations in the 
permutation samples by permuting full blocks of consecutively 
equal sample labels. The empirical guessing levels were estimated 
within the above described cross-validation framework in a total 
of 500 random permutations. 

A second application of permutation testing is to generate a 
distribution of classifier weights that can be obtained by separat- 
ing the data without knowledge of actual class labels. We used 
this method to determine the significance that a feature weight 
deviates from a randomly obtained weight. According to Mourao- 
Miranda et al. (2005) we calculated thep-value, denoted p w , as the 
ratio of number of features exceeding the weight of the classifier 
in the distribution of randomly obtained weights for this feature 
to the number of permutations. A low value for p w indicates that 
the voxel contributes discriminative information to the classifier. 

RESULTS 

BEHAVIORAL RESULTS 

The distribution of the duration of the perceptual interval lengths 
in bistable percepts can be approximated by a gamma probabil- 
ity density function (pdf) (Kleinschmidt et al., 1998). Figure 2 
shows the histograms of the interval durations in the two exper- 
iments and the fitted gamma pdf. The median duration in the 
online experiment was 21.4 s (Figure 2B). The maximum of the 
fitted gamma pdf is at 18.3 s (goodness of fit: _R 2 = 0.8416). In 
the offline experiment the median interval duration was 12.1s 



(Figure 2A) and the maximum of the gamma pdf is at 6.9 s (good- 
ness of fit: -R 2 = 0.9851). The longer perceptual intervals in the 
online experiment were presumably due to familiarizing the sub- 
jects with the stimulus before the beginning of the experiment. 
The average proportion of object percepts does not differ between 
the online and the offline experiment [offline: 53.9%, online: 
51.7%, two-sample f-test: f( 2 3) = 0.9, p = 0.37]. 

ACCURACY OF THE DECODED PERCEPTS 

Averaged over all subjects, the phenomenal content of perception 
(lines or object) was correctly determined 79.2% of the time (EPI- 
volumes) in the offline experiment, with a standard error of the 
mean (SE) of 2.6%. Figure 3A shows the individual results for 
all 15 subjects including the empirical 95% confidence intervals 
for guessing. In the best subject, the classifier tracked the phe- 
nomenal content nearly perfectly, at 94.4%. Most errors in this 
subject occurred in EPI-volumes around the time of a percep- 
tual switch (Figure 3B). This observation was also found in other 
subjects and is supported by the observation that without the 
three volumes around a perceptual switch the decoding accuracy 
over all subjects increases to 84.8% and that the decoding accu- 
racy dropped to 72.4% when we considered only the EPI-volumes 
around the switches for accuracy calculation. This suggests that 
the limited temporal sampling of fMRI (here 0.4 Hz sampling 
rate) may contribute to the decoding error. Importantly, in every 
subject the classification rate clearly exceeds the 95% confidence 
interval for guessing. The theoretical 50% guessing level is always 
included in the 95% confidence interval determined by the per- 
mutation test. This indicates that there is no bias for one class in 
the data sets (Rieger et al., 2008). 

ONLINE TRACKING OF CONSCIOUS PERCEPTION 

Online, time resolved discrimination of the phenomenal con- 
tent of the subjects' percepts was possible with great precision. 
Importantly, the online prediction started after only 14min of 
training data acquisition (412 samples). On average 82.8% (SE: 
2.4%) of the time the classifier correctly determined whether 
the subject currently perceived lines or an integrated object. 
Classification rates ranged from nearly perfect (94.2%) for the 
best subject to 65.9% for the worst. Although the classifier was 
retrained after each run, decoding accuracies did not signifi- 
cantly change over runs (average linear regression slope —0.24%, 
p > 0.05). In concordance with the offline experiment, most dis- 
crimination errors appeared around the perceptual switch (aver- 
age decoding accuracy increases to 87.0% correct when ignoring 
three volumes around switch and decreases to 70.4% correct 
when taking only switch volumes into account), indicating that 
a considerable proportion of the errors might occur due to slow 
hemodynamic response and a relative low sampling rate of fMRI. 

The average temporal delay from the beginning of the EPI- 
volume acquisition to feedback was 4.3 s (SE: 0.05 s). About half 
of this delay was due to the EPI-acquisition (TR = 2 s). Most 
of the remaining delay was required for data transfer and signal 
processing. Assuming a hemodynamic delay of 5 s, feedback was 
delayed by 9.3 s. The median duration of a phenomenal percept 
was 21.4s and only 4.5% of the percept durations were shorter 
than this delay. Subjects reported that they could readily relate the 
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FIGURE 2 | Perceptual switch intervals. Histograms show the relative 
occurrence how long a perceptual state lasted for (A) 15 participants of 
the offline experiment and (B) 10 participants of the online experiment. 
A gamma probability density function (pdf) was fitted to the 



distributions. The median duration of one percept in the offline 
experiment was 12.1s (pdf shape: 2.00; scale: 7.51). In the online 
experiment the median duration of the percept was 21.4s (pdf shape: 
5.29; scale: 4.26). 



feedback to their phenomenal percepts despite the delay. The fea- 
ture selection scheme described in section Online-classification 
revealed an average feature set size of 3997 voxels (SE: 215 voxels). 

The classification rates obtained in an online experiment are 
most likely sub-optimal. One reason is that we stopped feature 
selection after the second run. To test if even more reliable infor- 
mation about the momentary conscious percept can be revealed 
by optimization of the signal processing chain we performed an 
additional offline leave-one-run-out cross-validation analysis in 
which we optimized preprocessing parameters as described in 
section Offline-classification. On average the decoding accuracy 
increased by 6% to an accuracy of 88.8% (SE: 1.9%; 74.5-94.2%) 
(Figure 4). All classification rates clearly exceeded the individually 
determined 95% confidence intervals for guessing. Importantly, 
the decoding accuracy is higher when subjects received feedback 
compared to the experiment without feedback presentation. Our 
results clearly show that fMRI in combination with advanced 
machine learning approaches makes it possible to track the phe- 
nomenal content of a subject's percept online with substantial 
time resolved accuracy. 

GENERALIZATION OVER SUBJECTS 

In a final analysis we investigated whether a classifier derived from 
a population of subjects, would be capable of predicting the phe- 
nomenal content of a new subject's percept. This is of particular 
interest for brain computer interface related research. We trained 
the classifier in a leave-one-subject-out cross-validation scheme 
as described in section Cross-subject generalization. When we 
excluded the EPI-volumes around perceptual switches for train- 
ing and evaluation, using only the EPI-volumes with the most 
reliable labels, we obtained 78.1% correct classifications for the 
data from the offline experiment and 79.4% accuracy for the 
data from the online experiment. Including the EPI-volumes 
acquired around switches in the evaluation yielded 68% accu- 
racy with the data from the offline and 73.11% with the data 
of the online experiment. In this analysis accuracy exceeded the 
guessing level for all but one of the subjects. The result suggests 
that a classifier derived from a population can indeed transfer to 



new subjects, although information loss occurred compared to 
the single subject analysis. 

BRAIN AREAS INVOLVED IN PERCEPT DISCRIMINATION 

An essential question is what brain networks discriminate 
between the contents of conscious percepts in dynamic object 
integration. We analyzed the trained classifiers to address this 
question. In case the input features are scaled comparably, it is 
possible to interpret the feature weights learned with linear SVM 
as informative voxels in a brain network that discriminates the 
contents of the percepts. 

To determine if a voxel contributed significantly to discrim- 
ination of percepts we used a combined multivariate and uni- 
variate approach. First we calculated a multivariate p-value p w 
for each voxel by non-parametric permutation testing involv- 
ing all subjects from both experiments using the group data 
sets described in Cross-subject generalization and labels "line" 
or "object" percept permuted among the BOLD-data. This pro- 
cedure provides a null distribution of weights for each voxel. 
We trained 1000 classifiers for both group data sets. In the next 
step we used the weights from the classifiers trained with the 
actually measured labels and the permutation distributions of 
weights to calculate the probability p w that the weights obtained 
with the "correct" labels were observed by chance. In Figure 5 
we show a map of the voxels that reached a weight-threshold 
of p w <0.1 in the conjunction of both experiments. Second, 
in order to show BOLD change direction and univariate effect 
sizes we calculated f-values for each voxel. Red to yellow indi- 
cates that BOLD-activity in the voxel is higher during object 
percepts and blue to cyan indicates lower BOLD-activity during 
object percepts. The two sided f-value for a conventional univari- 
ate threshold of p < 0.001 would correspond to ("(3936) < —3.2 
or £(3936) > 3.2, which applies to 88.6% of the voxels shown in 
Figure 5 as well as to all of the clusters shown. Note that the 
maps are smooth and unvariate effects are uniform in each clus- 
ter. This smoothness indicates systematic effects over voxel in 
each cluster. Moreover, the combination of multivariate classi- 
fier training with univariate analysis suggests that the classifier 
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FIGURE 3 | Accuracy of offline tracking. (A) Single subject decoding 
accuracy obtained by a leave-one-run-out cross-validation. On average 
79.2% (SE: 2.6%) of the acquisition time the decoded percept was correct. 
For each participant the mean guessing level as determined by permutation 
testing is approximately 50% and the fraction of correctly decoded 
perceptual states exceeds the 95% confidence interval of the guessing 
level. Participants are sorted by performance. (B) Time course of the 
distance (arbitrary unit) of the data points in classification space to the 
separating hyperplane constructed by a SVM for participant d, Run 5 
(92.7% accuracy). White background signifies intervals in which the 
participant indicated perceiving the integrated object and gray background 
signifies intervals with line percepts. EPI-volumes for which the conscious 
percept was correctly identified are shown as circles, incorrectly identified 
percepts as crosses. 



does not only cancel out correlated noise in voxels with reliable 
weights. 

Brain areas that contributed significantly to the classification 
include relatively early retinotopic visual cortex [MNI coordi- 
nates (5, —77, 8), £(3936) = —23.0] and the middle temporal MT+ 
[MNI coordinates (47, -70, 13) and (-39, -65, 12), £( 3936 ) = 
— 14.5 and £(3936) = —13.7] where BOLD activity is lower dur- 
ing the object perception, and ventral visual stream areas such 
as putative lateral occipital complex [LOC, MNI coordinates 
(32, -88, 13) and (-34, -88, 8), f (39 36) = 4.6 and f (39 36) = 
7.4] where BOLD activity is higher during the object percept. 
However, the whole head classification approach suggests that 
the discriminative network in most subjects includes brain areas 
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FIGURE 4 | Accuracy of online tracking. Single subject decoding accuracy 
obtained during the online decoding experiment with feedback. On average 
82.8% (SE: 2.4%) of the time the content of the conscious percept was 
correctly decoded. An additional offline cross-validation procedure applied 
to the same data revealed 88.8% (SE: 1.9%) accuracy on average. 
Participants are sorted by performance. 



that are not primarily visual such as the lateral intraparietal sul- 
cus [IPS, MNI coordinates (27, -57, 63) and (-22, -58, 58), 
£ ( 393 6) = 15.9 and £(3936) = 14.3], the frontal eye fields [FEF, MNI 
coordinates (30, -9, 51) and (-28, -7, 53), £ (3936 ) = 7.5 and 
£(3936) = 13.2], and sporadic small clusters of voxels in the frontal 
lobe. 

DISCUSSION 

In this work we show that it is possible to use rtfMRI to reliably 
track online the current content of conscious visual perception 
while observers view ambiguous stimuli. The reliability of the 
approach is indicated by permutation tests and the observa- 
tion that errors occurred primarily close to the time of percep- 
tual switches. Importantly, we found that with our approach 
the trained classifiers generalize over subjects. Our approach to 
investigating the neural networks that underlie perceptual rep- 
resentation is data driven as well as hypothesis generating. This 
distinguishes it from earlier approaches that have focused on 
hypothesis testing (Kleinschmidt et al., 1998; Murray et al., 2002; 
Yin et al, 2002; Haynes and Rees, 2005). Our approach revealed 
novel components of a network that is informative about the cur- 
rent content of visual perception and allowed the relevance of 
these components to be quantified in terms of their decoding 
accuracy. 

LINKING SUBJECTIVE PERCEPTS WITH OBJECTIVE MEASUREMENTS 
OF BRAIN ACTIVATION 

There is a long standing debate as to whether subjectively per- 
ceived qualities can be related to objectively observed neural 
activations. Several decoding studies have successfully shown that 
certain stimulus features can be decoded from BOLD-activity (for 
reviews see Rees et al., 2002; Tong et al., 2006; Tong and Pratte, 
2012). However, it is not always clear in these studies if infor- 
mative changes in brain activation relate to encoding of visual 
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FIGURE 5 | Significance of classifier weights and activation 
differences. Brain patterns were extracted from the group classifiers 
of both offline and online experiments and overlaid on an MNI 
template. Brain regions were masked by p-values of classifier 



weights, thresholded at p w <0.1. P-values were obtained in a 
permutation test. Hot colors indicate higher BOLD signal during 
object percepts and cool colors indicate lower BOLD signals during 
object percepts. 



features or subjective percepts. A particularly strong case is made 
for decoding of subjective awareness when changes in the sub- 
jective state are uncorrelated with the physical stimulus (e.g., 
Grill-Spector et al., 2000; Chang et al., 2010). Here we employed a 
visual stimulus that is either perceived as a pair of vertically mov- 
ing lines or as an occluded object moving behind a narrow slit. 
Importantly, earlier studies (Fendrich et al., 2005; Rieger et al., 
2007) demonstrated that eye movements did not contribute to 
the formation of the object percept. This implies that the changes 
in brain activation we tracked reflect brain processes related to 
changes in subjective experiences rather than changes in the phys- 
ical stimulus. We show to our knowledge for the first time that 
spontaneous changes in subjective, stimulus independent experi- 
ences can be tracked online with high accuracy, good temporal 
resolution (0.5 Hz sampling rate), and over an extended time 
period (> 1 h) using real time fMRI. The high decoding accuracies 



we obtained, with up to 94% correct predictions, accentuate the 
strength of the link between neural network states and states of 
subjective perception. 

The informative patterns of brain activations we found with 
our approach generalized well over subjects. This is not gen- 
erally the case and may depend on the spatial resolution of 
the attempted analysis (e.g., Haxby et al., 2001) or the mea- 
surement technique employed (Fazli et al., 2009). We found 
that functional MRI combined with normalization to a standard 
brain and slight smoothing is a successful strategy for revealing 
informative networks at the scale of the full brain that trans- 
fer well between subjects. Such networks likely reflect general 
mechanisms underlying perceptual representation. The relative 
importance of individual vs. general (between-subject) features 
of brain organization and brain processing strategies can be esti- 
mated by assuming that the accuracy obtained with an individual 
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classifier reflects maximum obtainable performance, given a par- 
ticular classification approach. The relative reduction in accuracy 
produced by the between-subject generalization then indicates 
the importance of individual features. In our investigation this 
reduction was statistically significant but almost all of the decod- 
ing accuracies achieved with left-out subjects were well above 
chance level. This is important because it indicates that patterns 
generalizing over subjects have a meaningful physiological inter- 
pretation. Together these results indicate that our data driven 
approach can be used in combination with ambiguous stimuli to 
reveal brain networks that are tightly linked to subjective percepts. 

BRAIN NETWORKS WITH DECREASED BOLD-ACTIVITY DURING 
INTEGRATED OBJECT PERCEPTS 

Analyzing the importance of individual brain voxels for the dis- 
crimination of the line vs. object percepts revealed an extended 
posterior brain network in which activation in lower tier sen- 
sory brain areas decreases during object percepts and activation 
in higher tier brain areas tends to increase. The results from our 
data driven approach confirm and considerably extend previous 
studies. 

In concordance with earlier studies using similar ambiguous 
stimuli (Murray et al, 2002; Fang et al, 2008; de-Wit et al, 2012) 
we find a reduction of brain activity in early retinotopic visual 
areas, that represent the physical stimulus features such as line 
length and orientation. Concordant with de-Wit et al. (2012) we 
observe that these activation reductions extend around calcarine 
sulcus into the depth of the hemispheric cleft. In our study we 
also find reduced activation during object percepts in an area on 
the lateral surface, in the ascending limb of the inferior tempo- 
ral sulcus, a reliable landmark for human motion sensitive area 
MT+ (Dumoulin et al, 2000) and in the vicinity of the fusiform 
gyrus, considered to be involved in higher level visual object rep- 
resentations (Ishai et al, 2000). Murray et al. (2002) argue that 
the reduction of activation in VI during integrated object per- 
cepts is in accord with predictive coding theory (Mumford, 1992; 
Rao and Ballard, 1999). In this theory, higher tier areas con- 
vey a prediction of the features of the object represented to the 
lower tier areas which signal the difference between the predic- 
tion and the actual stimulus. This error signal is fed forward to the 
higher tier areas to adjust the object representation. The smaller 
the error between the prediction from higher tier areas and the 
stimulus representation in early sensory areas, the lower is their 
activation. Murray et al. (2002) suggested that during integrated 
object percepts higher tier brain areas predict the representation 
of the sensory stimulus in earlier areas and, as a consequence, 
the activation drops in lower tier areas. The reduced activation 
in hMT+ during object percepts fits with this interpretation. In 
earlier studies (Fendrich et al, 2005) we could show that during 
object percepts, the brain derives knowledge of horizontal object 
motion and could predict the movement of the integrated moving 
object from the movement of the segments. Similar to our results 
Castelo-Branco et al. (2002) found increased MT+ activity while 
the movement of two plaid stimuli was perceived as belonging to 
different objects and decreased activity when they were perceived 
as one integrated object. However, more detailed analysis reveals 
two effects that may require further elaboration of the predictive 



coding explanations of the neural deactivation patterns. First, 
we find reduction of brain activation during object percepts in 
ventral brain areas which are considered object selective and 
located relatively high in the processing hierarchy, according to 
standard theories of visual processing (Ungerleider and Mishkin, 
1982). Conversely, within the predictive coding framework, the 
reduction of activation during object percepts suggests that these 
brain areas are located relatively low in the processing hierarchy. 
Second, as de-Wit et al., 2012 already pointed out, the reduc- 
tion of activation in early visual cortex may extend beyond the 
retinotopic representation of the stimulus. 

BRAIN NETWORKS WITH INCREASED BOLD-ACTIVITY DURING 
INTEGRATED OBJECT PERCEPTS 

In concordance with Fang et al. (2008) we find increased brain 
activity bilaterally on the lateral surface of the occipital lobe dur- 
ing object percepts. This brain area is localized in the anatomical 
region of the LOC, which has been linked to the construction 
of object shape (Kourtzi and Kanwisher, 2000). The other areas 
with increased information were located toward and in parietal 
cortex. The function of the parietal areas is less clear. Recent stud- 
ies suggest a functional role of parietal areas in the grouping of 
sequentially presented elements into coherent object percepts. For 
example, Zaretskaya et al. (2013) reported increased activation in 
anterior intraparietal sulcus (alPS) during grouping of local ele- 
ments into objects, and Peltier et al. (2007) reported LOC and IPS 
activation during haptic shape perception in which the sequen- 
tial haptic information about shape has to be integrated into a 
coherent object percept. These parietal activations are not unique 
to the visual modality. Cusack (2005), using an auditory stream- 
ing paradigm, reported increased activity in IPS during object 
integration and hypothesized that the intraparietal area is gen- 
erally involved in structuring sensory information. Moreover, we 
consider it highly unlikely that the parietal activation patterns 
can be explained by eye movements alone. In two earlier stud- 
ies (Fendrich et al, 2005; Rieger et al., 2008) we characterized eye 
movements during the line vs. object percepts using highly accu- 
rate eye tracking methods and found only a small difference in the 
eye movement patterns between lines and object percepts. 

Together, the results from our data driven approach suggest a 
working hypothesis: the LOC together with parietal brain areas 
are involved in constructing the percept of the integrated object. 
The reduction of activation in early visual areas during object per- 
cepts might be an indication of an interaction between higher and 
lower tier brain areas in which the former predict the representa- 
tions in the latter. However, the spatial extent of the activation 
reduction in early retinotopic cortex and the function of ven- 
tral object selective cortices in the visual hierarchy require further 
evaluation. 

REAL TIME vs. OFFLINE fMRI-ANALYSIS 

Real-time analysis allowed us to feed the decoded percepts back 
to the subject. For such biofeedback experiments it is neces- 
sary that the classification method provides high generalization 
performance and that it requires little training data. We found 
that the average accuracies in the online and offline experiments 
were comparable, with a slight tendency for higher accuracies 
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in the online experiment. Theoretically, the online classification 
should perform worse than the cross-validation method because 
it uses a smaller training set. Subsequent cross-validation analysis 
of the online data confirmed that this offline approach can fur- 
ther increase the accuracy obtained with the online data set. We 
speculate that the reason of the accuracy advantage in the online 
experiment is due to the feedback which may have increased the 
subject's sense of agency. Receiving a feedback of the current con- 
scious perceptual state could reduce mental fatigue and thereby 
reduce noise, which is essential for good classification perfor- 
mance. Another factor that may have contributed to the increased 
accuracy is the longer durations of the perceptual intervals which 
may be better reflected in the slow BOLD-response. The obser- 
vation that most classification errors occurred around the time 
of perceptual switches supports this assumption. Studies indicate 
that subjects can volitionally control the switching of the percepts 
(van Ee et al, 2005; Kornmeier et al., 2009). Due to the delay of 
the BOLD-response it takes several seconds until the feedback 
signals a switch in perception. Although the subject's informal 
reports indicated that they could deal well with the delay, too 
short perceptual intervals could have led to more discordant feed- 
back. This may have motivated our subjects to volitionally control 
and extend the duration of the percepts in order to increase the 
synchronization of the percept with the delayed feedback. Thus, 
the feedback may have increased perceptual inertia or cogni- 
tive bias leading to prolonged percepts in the online experiment. 
Further studies are necessary to characterize the neural basis of 
this biofeedback effect. 

AMBIGUITY AND STREAM SEGREGATION 

Ambiguous stimuli can induce bistable perceptual organiza- 
tion that switches between categorically different object percepts 
despite constant visual input. This property makes them ideal 
to investigate processes of perceptual organization independent 
of potential confounds by changing physical stimuli. Here we 
outlined a data driven classification approach to reveal brain 
networks underlying perceptual organization. The classification 
accuracy provides an intuitive measure of the relevance of the 
observed effects. We would like to point out, that the approach 
is not restricted to the visual domain. In the auditory domain, 
multistable stimuli, such as two tone sequences (Bregman, 1990; 
Gutschalk et al., 2005; Deike et al., 2010), play an important role 
in the characterization of processes underlying stream organiza- 
tion. We suggest that the approach presented here, can be used 
to investigate the neural basis of auditory object formation with 
ambiguous auditory stimuli. Support for the assumption that this 
is possible comes from studies showing that the dynamics of audi- 
tory and visual perceptual switches are similar. In both modalities 
percept durations are gamma distributed and have similar lengths 
(Pressnitzer and Hupe, 2006; Denham et al., 2012). Furthermore, 
the different modalities seem to share common brain regions for 
perceptual organization (Cusack, 2005). 
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